Saliva microbiome, dietary, and genetic markers are associated with suicidal ideation in university students

Here, salivary microbiota and major histocompatibility complex (MHC) human leukocyte antigen (HLA) alleles were compared between 47 (12.6%) young adults with recent suicidal ideation (SI) and 325 (87.4%) controls without recent SI. Several bacterial taxa were correlated with SI after controlling for sleep issues, diet, and genetics. Four MHC class II alleles were protective for SI including DRB1*04, which was absent in every subject with SI while present in 21.7% of controls. Increased incidence of SI was observed with four other MHC class II alleles and two MHC class I alleles. Associations between these HLA alleles and salivary bacteria were also identified. Furthermore, rs10437629, previously associated with attempted suicide, was correlated here with SI and the absence of Alloprevotella rava, a producer of an organic acid known to promote brain energy homeostasis. Hence, microbial-genetic associations may be important players in the diathesis-stress model for suicidal behaviors.


Description of the cohort and survey instruments used. Mediterranean Diet Quality Index
(KIDMED) and Patient Health Questionnaire-9 (PHQ-9) responses are provided in Table 1. The cohort was comprised of 489 total students of whom 318 were female (65.0%) and 171 were male; the average age was 21.37 ± 4.14 years. Of the 489 subjects, 60 (12.3%) screened positive for SI, as defined by a score ≥ 1 on question 9 of the PHQ-9, and 109 (22.3%) scored ≥ 10 on the PHQ-9, which is the suggested cut-off for Major Depressive Disorder (MDD) with optimal sensitivity and specificity according to Kroenke et al. 38 . Of the 109 subjects (22.3%) who scored ≥ 10 on the PHQ-9, 64 scored within the moderate, 33 within moderately severe, and 12 within severe ranges, respectively. A total of 166 subjects (34.0%) scored in the mild range for depression (PHQ-9 score of 5-9), while 214 (43.8%) scored in the none to minimal depression range (PHQ-9 score of 0-4). The KIDMED questions, as presented to the participants in the electronic survey, are provided in Supplementary  Table 1.
Regarding the protective haplotypes, subjects with DQA1*03 were 7.3× more likely to not have SI, p = 0.001. Subjects without DQB1*03 were 2.2× more likely to screen negative for SI, χ 2 (1, N = 292) = 4.8, p = 0.028, than subjects with DQB1*03. Subjects who had at least one DRB1*15 allele were slightly more likely to have screened positive for SI, χ 2 (1, N = 289) = 3.4, p = 0.064. No individual with SI was homozygous for DQ8 χ 2 (1, N = 289) = 6.2, p = 0.0051, or had DR4-DQ8, χ 2 (1, N = 270) = 8.8, p = 0.003, the latter, a haplotype which was observed in 21.5% of subjects with no SI. Only 6.1% of subjects with SI had DQ8 compared to 28.6% of those with no SI; those with DQ8 were 6.2× more likely to not endorse SI, χ 2 (1, N = 285) = 6.2, p = 0.0051. Those with Table 1. Diet and depression distributions in the cohort. Distribution of Patient Health Questionnaire-9 PHQ-9 and KIDMED total scores across 489 subjects: Depression level categories defined as follows: screened positive for recent suicidal ideation (SI) based on a score of ≥ 1 on question 9 of the PHQ-9 (SI), regardless of PHQ-9 score; did not endorse recent SI and had a PHQ-9 total score of 0-4 (None or minimal depression), 5-9 (Mild depression), or ≥ 10 (Depression, moderate to severe). Distribution of PHQ-9 scores is then provided based on the stratification by Kroenke et al.: PHQ-9 total scores, 0-4 (None to minimal), 5-9 (Mild), 10-14 (Moderate), 15-19 (Moderately severe), ≥ 20 (Severe). Two subjects (one with SI and moderate depression and one with NOSI and mild depression) are not presented as their KIDMED had missing values, thus the KIDMED index could not be calculated.  Table 2. Human Leukocyte Antigen (HLA) haplotypes associated with incidence of suicidal ideation (SI). χ 2 and odds ratio (OR) results for HLA haplotypes that were near or significant. Fisher exact test was computed where χ 2 is not listed. Risk indicators associate with increased incidence of SI, while protective factors associate with decreased incidence. www.nature.com/scientificreports/ Although observed at low relative abundances and prevalence, significant differences were observed in the abundance of Treponema denticola (Fig. 1h, p = 0.037) and Fusobacterium naviforme (Fig. 1j, p = 0.0054), and near significant differences were observed across the four groups with respect to Leptotrichia hofstadii (Fig. 1g, p = 0.11), Dialister pneumosintes (Fig. 1i, p = 0.067), and Conservatibacter flavescens (Fig. 1k, p = 0.056), which were all seen at the highest abundance in individuals who were AA and did not have SI. No individual who had the G allele and expressed recent SI had Leptotrichia hofstadii in their saliva, and Dialister pneumosintes was only observed in one of these subjects.
Dietary confounders of saliva microbiome diversity. Saliva microbiome data was available for a total of 372 individuals. Microbial divergence of non-SI controls (N = 325) and individuals with SI (N = 47) was observed at the community-level by the Bray-Curtis dissimilarity measure (p = 0.047). Dissimilarity appears to be regulated in part by sex and regular pasta or rice consumption (Supplementary Table 1, p's < 0.05). Variance is also explained by daily consumption of fruit and whether the subject visits a fast-food hamburger restaurant more than once weekly (p's < 0.1), but not by overall KIDMED compliance score (p = 0.697).
Microbial diversity indices are lower in individuals with SI in the presence of sleep issues. Microbiome diversity differences in suicidal ideation (SI) endorsement concurrent with the presence of sleep issues were assessed. Shannon and Diversity Coverage indices (p's = 0.045 and 0.036, Supplementary Fig. 1b,d) were significantly lower in SI sleep . Diversity Gini Simpson, Inverse Simpson, and Evenness Simpson were lower in individuals with SI, although at p ≤ 0.1. Richness in NOSI sleep and SI sleep was not significantly different (Supplementary Fig. 1g,h).

Network analysis of salivary microbiota observed in individuals with sleep issues. Bacterial
networks of salivary genera were assessed in the saliva of individuals with sleep issues with correlation analysis ( Supplementary Fig. 2). Spearman correlation coefficients from the network analysis are provided in Supplementary Table 3.
Differentially expressed ASVs in individuals with SI. Using DESeq2, normalized base mean counts of the ASVs observed in the saliva community were compared in four methods: (1) the full cohort: those with SI compared to those with no SI, NOSI, and then selected cohorts, where those with SI were compared to (2) those with PHQ-9 scores above 10 (the published threshold for MDD), DEP, (3) those with PHQ-9 scores < 5, NODEP, and (4) those with PHQ-9 scores between 5 and 9, MILD), separately. Depending on the pairwise group comparison, between 240 and 439 ASVs were found to be differentially abundant between those with SI and those without (Supplementary Fig. 3; see Supplementary Tables 4-7 for full statistics of all significant ASVs). SI and NOSI groups demonstrated the greatest number of significantly different ASVs, while SI and DEP had the fewest, suggesting that their microbial communities were more similar, although still distinctive.
Among the taxa observed at higher abundances in individuals with SI were members of Actinobacteria, Bacteroidia ( Of the more dominant ASVs (those with a normalized base mean count > 40; Supplementary Fig. 3g) seen in any of the four models, ASVs belonging of Lautropia, Veillonella, and Megasphaera micronuciformis were among those more abundant in individuals with SI, while ASVs belonging to Alloprovetella, Porphyromonas, R. mucilaginosa, N. perflava, P. nanceiensis, S. oralis, and were more abundant in any of the comparative groups without SI. The largest increase in relative abundance in individuals with SI compared to NOSI was observed in ASV509 (Bacteroides dorei, 4.015 log 2 fold change), and ASVs that were the most elevated in NOSI included  www.nature.com/scientificreports/ ASV214 and ASV250 (Alloprevotella tannerae), ASV115 (Alloprevotella), and ASV147 (Prevotella melaninogenica), all log 2 fold changes > 3.1. In the NOSI model comparison, Veillonella ASV121 and ASV40 were both elevated in individuals with SI. According to NCBI BLAST, these ASVs have 99.8% and 95.0% homology with Veillonella nakazawae T1-7. ASV40 was 3.14× higher in individuals with SI versus those without (NOSI), with a normalized base mean of 125. Individuals with SI were 2× more likely to have a non-zero count of at least one of the ASVs, χ 2 (1, N = 372) = 5.1, p = 0.024. While 51.1% of SI had at least one of these ASVs, they were found at 34.2% prevalence in NOSI.
Differentially abundant closest species and genera in SI after controlling for sex, diet, and sleep. After propensity score matching on sex and the diet confounders were defined, 17 unique species were discovered to be differentially more abundant in the matched controls (NOSI) and 22 species and 21 genera were more abundant in individuals with SI at p < 0.1 (Table 4). Among the NOSI controls,18 genera were more abundant compared to those with SI (Table 4).
After additionally matching on sleep issues, 26 genera were discovered to be more abundant in controls and 8 were more abundant in individuals with SI (Table 5). Twenty species were more abundant in controls and 14 were more abundant in individuals with SI (Table 5). Several species and genera were consistent across the two propensity score matching paradigms, demonstrating the bacterial association is present with no regard to sleep as a potential confounder. In individuals with no SI, Granulicatella adiacens, Streptococcus parasanguinis, Fusobacterium periodonticum, Haemophilus influenzae, Caulobacter vibrioides, and Treponema amylovorum were elevated in both models, as were Alloprevotella, Porphyromonas, Gemella, Fusobacterium, Massilia, and Caulobacter. In individuals with SI, Megasphaera micronuciformis, Veillonella rogosae, Lactoboccous lactis, and Fusobacterium naviforme, as well as Megaphaera, Lactococcus, and Lautropia were elevated in both models.
Granulicatella was the single genus more abundant (log 2 Fold Change of 0.903, padj = 0.00913) in NOSI sleep controls than those with SI sleep , with a base mean of 241.86. Eight species were more abundant in NOSI sleep controls, including Capnocytophaga leadbetteri, Granulicatella adiacens, Prevotella melaninogenica, and Prevotella intermedia, while Megasphaera micronuciformis was more abundant in SI sleep (log 2 Fold Change of 1.203, padj = 0.0264) and with normalized base mean counts of 1800.6 and 1564.6 for P. melaninogenica and M. micronuciformis, respectively, indicating they are among the more dominant bacteria in the saliva ecosystem. See Supplementary Table 8 for the DESeq2 statistics for the significant genus and species.
The same analysis was performed within the cohort matched on sex, diet (consumption of fruit daily, pasta or rice at least five times weekly, nuts at least twice weekly), and whether sleep issues were present, to assess whether similar patterns in the microbiota were observed when holding sleep issues constant. At a threshold of 70% prevalence, 38 ASVs and 2,868,689 resultant sequences were retained with an out-of-bag error rate of 0% for the classification of SI when considering only those individuals with sleep issues (Fig. 2d). The most important ASVs for the classification are listed (Fig. 2e) Fig. 2f.
Salivary microbiota abundances are associated with HLA haplotypes linked to SI. Several notable bacterial species were associated with the HLA haplotypes linked to increased SI incidence in this cohort. Bacterial mean relative abundances are provided in Supplementary Table 9.
The mean relative abundance for Enterobacter kobei was observed at 0.0% but 0.35% in those who did not have the haplotype (Supplementary Table 9). Likewise, Serratia marcescens was not observed in DRB1*15 homozygotes and was present at 0.90% in those who did not have DRB1*15 and at 0.97% in those with only one allele.
Although at low abundance, after controlling for sex, dietary habits (e.g., daily consumption of fruit, regular consumption of nuts, dining at a fast-food hamburger restaurant more than once weekly, consumption of pasta or rice every day or at least five times a week), and sleep issues, several genera and species were significantly differentially abundant between those who were DRB1*15 (N = 78) and those who were not (N = 211). Of note, Tannerella was observed at significantly higher counts in those with DRB1*15, excepting one outlier who reported to eat sweets frequently (more than three times daily) and pasta or rice more than five times a week (Fig. 3a, p = 0.0011). Johnsonella was also observed higher frequency in those with DRB1*15 (Fig. 3b, p = 0.0059). Faecalibacterium prausnitzii, although rarely observed in the saliva, was not present in any individual endorsing SI, nor any individual with DRB1*15, both in the full cohort (N = 289) and when controlling for sex, diet, and sleep in a 2:1 propensity score match (N = 234, Fig. 3c).

Discussion
To our knowledge, this is the first investigation of oral microbiota associations with SI. Suicidal thoughts and behaviors are a strong risk factor for MDD onset 39 , and the converse is also true: MDD increases risk for SI. Discovering links to oral microbiota may help us elucidate novel therapeutic targets for MDD and other neuropsychiatric conditions. Accumulating evidence supports the view that depression is an inflammatory disease, which motivated us to consider HLA genes and salivary microbiota which could contribute to the etiology of  Table 4. Salivary bacterial genera and closest cultured relative species that are differentially abundant in suicidal ideation (SI) after controlling for sex and diet. Relative abundances of significant differentially abundant closest-species-relatives in individuals without SI (NOSI) and individuals with SI (SI) after propensity score matching on sex and diet factors. Two models, with 1:2 and 1:3 matching of SI to NOSI, were conducted and the Kruskal Wallis test results on the relative percent abundances presented. All species and genera presented here have a p value of < 0.05 in at least one of the two models.  Table 5. Salivary bacterial genera and closest cultured relative species that are differentially abundant in suicidal ideation (SI) after controlling for sex, diet, and the presence of sleep issues. Relative abundances of significant differentially abundant closest species-relatives in individuals without SI (NOSI) and individuals with SI (SI) after propensity score matching on sex, diet factors, as before, and additionally on sleep issues. Two models, with 1:2 and 1:3 matching of SI to NOSI, were conducted and the Kruskal Wallis test results on the relative percent abundances presented. All species and genera presented here have a p value of < 0.05 in at least one of the two models. Highlighted are the species and genera that were consistent with the previous propensity score matching paradigm, which did not control for sleep, demonstrating the bacterial association is present with no regard to sleep as a potential confound.  www.nature.com/scientificreports/ SI. Here, we identified novel HLA haplotypes that may be risk indicators or confer protection against SI, as well as salivary microbes that were differentially abundant across these haplotypes (Fig. 5). Across the analytical approaches applied in this investigation, Megasphaera micronuciformis was found to be elevated in individuals with SI. Among the genera that comprise the mixed species signature of periodontitis, confirmed by both culture-dependent and culture-independent methods 31 , is the lactic acid producing genus, Megasphaera, which is also elevated in periodontal disease 40 . Although certain Megasphaera species may be commensal, such as Megasphaera sp. Strain DISK18 which does not possess any virulence determining genes 41 , M. micronuciformis has been implicated in various human diseases including bacterial vaginosis 42 and cancer 43,44 , with evidence to support its role in increasing elevation and cell death 42 . It has been isolated from gut, oral, and female reproductive tract 42,43,45 . Megasphaera micronuciformis has been implicated in various cancers, where it is a putative microbial biomarker found commonly on tongue coatings in gastric cancer patients 43 , along with Selenomonas sputigena ATCC 35185, and is in elevated abundance in the gut microbiota of locally advanced lung cancer chemotherapy treatment non-responders 44 . M. micronuciformis has been shown to elevate inflammatory mediators, induce cytotoxicity, and lead to accumulation of cell membrane lipids in a model of vaginal ecosystems where Veilloneallaceae members including M. micronuciformis were shown to alter the microenvironment of the cervix 42 . Selenomonas was also found at higher abundance in individuals with SI. This is a genus that is elevated in generalized aggressive periodontal disease 46 . Another notable bacterium observed in higher abundances in individuals with SI included Bacteroides dorei (log2fold change of 4.0 when compared to all individuals without SI). This bacterium has been implicated as a potential biomarker for future autoimmunity, where it dominates the gut microbiome of infants prior to seroconversion 47 .
Two ASVs that were more likely to be present in individuals with SI and were found at elevated abundance have high shared sequence identity with a newly proposed species of Veillonella, Veillonella nakazawae T1-7 Figure 3. Selected genera and species that are associated with DRB1*1500 genotype and increased incidence of suicidal ideation (SI). Relative abundances of (a) Tannerella and (b) Johnsonella were compared across DRB1*1500 groups after controlling for sex, diet factors, and sleep using propensity score matching in a 1:2 ratio of those with or without DRB1*1500 (n = 156). (c) Relative abundance of ASVs with a closest cultured relative of Faecalibacterium prausnitzii. The two genotype groups are further categorized based on SI endorsement (Yes/ No) at right. This species was not observed in any individual endorsing SI and only ever present in those without SI and not harboring the DRB1*15 allele. For all taxa at (a-c), significance was calculated with a Wilcoxon test statistic between those with or without the DRB1*15 allele.  48 . Strain T1-7 produces acetic and propionic acids as fermentation end-products. Its catalase production and partial sequence for dnaK, rpoB, and citrate synthase (gltA) distinguish the species from other members of Veillonella, suggesting a novel lineage within the genus. The literature on the oral microbiome and its connection to depression is quite limited, with only one recent publication in this area. The investigation found that Neisseria spp. and Prevotella nigrescens were more abundant in 40 depressed young adults compared to matched controls 49 . This study did not control for sleep or diet factors, as we have done in the current report, and utilized a six-item instrument adapted from a diagnostic interview which did not capture suicidal behaviors or SI. We controlled for sleep and diet factors, and found decreased abundance in individuals with SI, while the authors found slightly increased abundance in their depression cohort. Saliva is a major nutrient source for the surface coating of the tongue, contributing glycoproteins, amino acids, and peptides. Some bacteria including Actinomyces, Veillonella, and Fusobacterium are capable of degrading short-chain fatty acids into ammonia, sulfur, and indoles, while the latter two and Lactobacillus are also capable of lactate conversion into weaker acids, which can neutralize acid in the mouth 50 . Although Fusobacterium periodonticum and Fusobacterium as a genus were both overall found in greater abundance in NOSI, one Fusobacteriaceae species, Fusobacterium naviforme, was found higher in individuals with SI. Veillonella atypica was observed in elevated abundances in individuals with SI. In the present report, two Lactobacilliaceae species, G. adiacens and S. parasanguinis, were observed at higher abundance in the absence of SI. All associations occurred after propensity matching on sex and diet, and irrespective of sleep as a potential confounder.
Among the bacteria associated more with the microbiome of the controls rather than those with SI were Granulicatella, R. mucilaginosa, Haemophilus parainfluenzae, Streptococcus, and N. perflava, which is considered a commensal. These findings are consistent with the current literature investigating the healthy oral cavity. R. mucilaginosa is a Gram-positive, facultative anaerobe with weak catalase activity and is a common resident of the natural microbiota of the upper respiratory tract and oropharynx 51,52 . R. mucilaginosa was detected at 3.41% of the aerobic microbiota of the human oral cavity and has been isolated from pit, fissure, buccal surface plaque, buccal membrane surface, soft tissues of the oral cavity, and the tongue dorsum, where predominantly resides 52 . In a recent WGS analysis of healthy individuals, H. parainfluenzae was found to be the most abundant www.nature.com/scientificreports/ species in the genus, as were P. melaninogenica, N. subflava, and R. dentocariosa of their respective genera, while Streptococci were the most abundant genus in the oral cavity 53 . Haemophilus, Streptococcus and Pseudomonas were found to be present on the tongue only in low abundance in patients with liver carcinoma and were more abundant in controls 54 . Likewise, Granulicella was significantly elevated in lung cancer chemotherapy responders versus non-responders 33 . Here Granulicatella, specifically Granulicatella adiacens, was characteristic of individuals without SI all major comparative approaches: (1) SI versus all individuals without SI (NOSI); (2) SI sleep versus NOSI sleep when assessing only those with sleep issues, irrespective of the inclusion or exclusion of NOSI individuals with moderate to moderately severe PHQ-9 scores; and (3) using propensity score matching on sex and diet variables, as well as sleep. Another Lactobacillales bacterium, Streptococcus parasanguinis, was discovered to be less abundant in individuals with SI, even after matching on sex, diet, and sleep. Streptococcus parasanguinis is a commensal bacterium of the oral cavity and gut 55 and an early colonizer of the oral cavity 56 with high prevalence. It possesses the ability to adhere to the oral cavity and bind to other bacteria directly due to its long peritrichous fimbrial surface structures 57 . Importantly, S. parasanguinis demonstrates antimicrobial properties in the oral cavity and has been shown to inhibit periodontopathogen growth in vitro through the production of hydrogen peroxide as the primary mechanism 58 , playing a role in the etiology of dysbiotic oral disease. Increasing evidence of autoimmunity or immune dysfunction being implicated in neurobiological, neurodevelopmental and neuropsychiatric disorders, such as in schizophrenia 14 , motivated us to investigate whether HLA genes encoding proteins that regulate the immune system are associated with increased SI risk or protection and to explore bacterial associations with those haplotypes. Whether HLA genes are involved in psychiatric and neurodevelopmental disorders is not established, with contrasting conclusions. The role of HLA in depression is not clear or may have a moderate effect 20 . HLA genes or C4 complement loci 59,60 have been implicated in intellectual disability, autism [15][16][17][18][19] , and schizophrenia 61 . Although the haplotype risk is inconclusive 62 , molecular mechanisms involving excessive complement activity may explain in part the synaptic reduction in schizophrenia 61 . Most recently, in a large dataset of over 65,000 individuals participating in the Danish Integrative Psychiatric Research Consortium, Nudel et al. were unable to replicate reported associations but found protective effects of DPB1*1505 on autism and intellectual disability 15 .
Here, those with DQ2.2 and DQ2.5 alleles were more than three times as likely to endorse SI, as were those who did not carry DR4-DQ8. None of the 55 subjects carrying the homozygous DR4-DQ8 allele expressed SI but was found in 51 of the 235 subjects (21.7%) without SI. DR4-DQ8 is associated with multiple autoimmune diseases including celiac disease and type 1 diabetes 63 . We are not aware of any connections between DR4-DQ8 and mental health. There are reports of connections between celiac disease and cognition, but these are controversial. The DR4-DQ8 positive association with no SI and celiac disease might predict that celiac disease would not be associated with mental health challenges. The DRB1*15 allele doubled the chances of SI in this cohort. DRB1*15 increases risk for multiple sclerosis [63][64][65][66] . A previous connection with a nervous disorder is consistent with a role for DRB1*15 in SI. DRB1*04 is negatively associated with schizophrenia 15 . DPB1*15 may be protective against intellectual disability and autism 15 . The associations in this cohort provide additional evidence that HLA may contribute to depressive etiologies.
In this work, we also assessed SNPs that have been associated with suicidality in published GWAS studies. Of the fourteen SNPs we tested, rs10437629 was significantly associated with incidence of SI in this cohort. This SNP has been previously identified as associated with attempted suicide 6 . We found that the homozygous GG or heterozygous AG alternative alleles were positively correlated with SI. This SNP is located within an intron of the gene coding for the membrane bound KIAA1549L protein, which suggests that the effect of this mutation may be regulatory. Furthermore, the KIAA1549L protein is enriched in the brain and parathyroid gland 67 . Succinate-producing Alloprevotella rava was only found in those without SI and only in those homozygous for rs10437629 AA alleles (Fig. 2b). Succinate has been shown to be produced by gut bacteria and is a known signaling molecule to the brain. Succinate has long been known as a short chain fatty acid that assists in glucose oxidation in the brain and in brain respiration 68,69 . Application of succinate to the brain can improve cerebral metabolism after traumatic brain injury 70,71 . Moreover, succinic acid is among several plasma metabolite neurotransmitters proposed to have diagnostic capability in MDD, found at significantly reduced levels in plasma of individuals with MDD in a metabolomics study 72 . Succinate levels have been reported to be 2.3 times higher in the saliva of males compared to the saliva of females 73 . The mechanism for higher levels of succinate in male versus female saliva is unknown. This is consistent within our cohort, where females were 1.94 times more likely to express SI than males.
Investigating suicidal behaviors at the university affords the opportunity to intervene in at-risk students prior to perceived and anticipated stress events, such as mid-term examinations, in a population where the incidence of suicidal ideation is high. Academic stress has been associated with increased interleukin-1B 74 , plaque 75 , and gingival inflammation 76 and it is unknown whether the bacteria identified here mediate these associations. The microbiota may provide a potential therapeutic target for brain health and neurological disorders given their role in neuroinflammation 25 . Although we did not assess inflammatory markers, the present study identifies specific bacteria in the mouth that appear to occur more frequently or at higher abundances in individuals with SI and bring to question the mechanism by which such bacteria might confer risk or protection against these phenotypes. Chronic low-grade inflammation may increase risk for MDD. Basal and LPS-stimulated inflammatory markers predicted MDD symptoms over nine years in the Netherlands Study of Depression and Anxiety (NESDA) cohort of MDD patients 77 . Environmental selection of microbiota based on resilience and diversity may be partly responsible for the rise of autoimmune and inflammatory disorders 78 but could also contribute to the etiology of depression. Understanding how these bacteria promote inflammation may give mechanistic insight into the diatheses of suicide, offering opportunity to characterize an individual's susceptibility to stressors, including precipitating and perpetuating factors. www.nature.com/scientificreports/ This work is the first to report associations of the oral microbiome with recent SI. It is important to note that the PHQ-9, upon which these results are based, only assessed for SI in the past 2 weeks. As such, it is possible that some individuals in the "No SI" control group had a history of SI or self-harm, which could alter the interpretation of results. History of self-harm was not addressed in this investigation, which would help elucidate whether the findings are specific to SI or to suicidal behaviors in general. Metrics such as prior history of SI and suicidal behaviors, antidepressant use, and comorbidities would benefit future extension and confirmation of these findings. Drug by gene interactions may increase risk of SI as a result of antidepressant treatment 10 . Data on antidepressant use must be included in future cohorts. Likewise, although several significant confounds of the microbiome, such as sex, sleep, and diet, were accounted for in this investigation, other data were not obtained, such as geography or ethnicity of the subjects. Because the present analyses are cross-sectional, we cannot speak to the chronology or causality of the factors that we have identified as potential risk indicators or being associated with increased incidence of SI. In this investigation, we assigned species names based on the closest cultured relative and 100% matching of the ASV using the 16S rRNA V3-V4 methodology. To be more confident in the precise taxonomy of the bacteria, metagenomic sequencing is warranted to interrogate bacterial functions and/ or other sequencing methods such as the rrn operon to obtain longer reads. Investigating the neurophysiological consequences of these microbes in the pathology of MDD and suicidal behaviors will require longitudinal cohorts and further study on the role in precipitating and perpetuating inflammation.
Here, we identified novel connections between diet, genetics, and saliva microbiome composition in suicidal ideation in young adults, expanding on the existing evidence supporting the diathesis of suicidality. To our knowledge, this is the first study to demonstrate a role for genetic and salivary bacterial interactions in the etiology of suicidal ideations. Our work warrants further investigation into the role that rs10437629 and HLA play in SI, and whether these genotypes influence suicidality risk via the saliva microbiota or independently. Evidence supports the notion that depression is an inflammatory disease 79 , with at least one-quarter of patients with depression demonstrating low-grade inflammation 80 . Elucidating how the oral microbiota mediate or contribute to the inflammatory and metabolic status of the oral milieu may inform novel interventions or preventions for MDD and SI, especially given the emerging and important connection between the mouth and brain.

Study design. Undergraduate and graduate students taking Microbiology and Cell Science courses at the
University of Florida voluntarily enrolled in the study between August 2018 and February 2020. The study was approved by and carried out in accordance with the University of Florida Institutional Review Board (IRB), with study approval IRB study #201801744. All methods were carried out in accordance with relevant guidelines and regulations.
Participants were recruited from Microbiology & Cell Science courses as approved by the IRB. The cohort was comprised of 489 total students of whom 318 were female (65.0%) and 171 were male; the average age was 21.37 ± 4.14 years. After providing their informed consent, participants completed the Mediterranean Diet Quality Index (KIDMED 81 ) and the Patient Health Questionnaire (PHQ-9 38 ) electronically through Research Electronic Data Capture (REDCap) 82 . The KIDMED was initially developed for children and adolescents, but has been validated in college students as a measure of Mediterranean diet adherence, with adequate test-retest reliability 83 . The PHQ-9 is widely used in clinical and research settings, with a sensitivity of 88% (potentially higher 84 ) and specificity of 88% for identifying major depression in respondents scoring ≥ 10. The 1-factor structure of the PHQ-9 has been validated in a sample of 857 U.S. college students, suggesting equivalent assessment across sex and racial/ethnic groups. Subjects provided a 2 mL saliva sample concurrently with the survey using the IsoHelix GeneFix DNA/RNA Collection kit (IsohelixTM DNA/RNA GFX-02 2 mL; Cell Projects Ltd., Harrietsham, UK), which stabilizes DNA and RNA long term at ambient temperature. Students who screened positive for suicidal ideation (SI) based on a non-zero response on question 9 of the PHQ-9 as well as those who scored > 19 on the PHQ-9 were immediately referred to the Counseling and Wellness Center at UF and those students who scored in the moderate to moderately severe range (10)(11)(12)(13)(14)(15)(16)(17)(18)(19) were informed of mental health resources.
Depression levels were calculated from the PHQ-9 total score as previously documented by Kroenke et al. 38 Binary (zero or non-zero) responses were considered in this manuscript for questions 3 and 9 of the PHQ-9, pertaining to sleep issues and SI, respectively. A non-zero response on PHQ-9 question 9 reflects that the participant denied having had "thoughts that [they] would be better off dead, or of hurting [themselves] in some way" for several days or more over the past 2 weeks 38 . A non-zero response on PHQ-9 question 3 reflects that the participant denied having "trouble falling or staying asleep, or sleeping too much" several days or more over the past 2 weeks.
Processing of saliva samples. Saliva samples were stored at ambient temperature per the manufacturer's instructions. These kits are designed to collect a 2 mL saliva sample into a 2 mL mix of proprietary stabilization and lysis buffer pre-filled into the collection kit. The optimized stabilization buffer allows for long-term storage of DNA and RNA at ambient temperature. DNA was extracted and stored at 4C. Extraction was carried out in accordance with the Isolation Protocol for GFX-02 GeneFix saliva samples, with the following revisions: incubation at 37 °C and an increased TE/DNA rehydration step to a duration of 95 min, and omission of purification steps 9-11 since purification was subsequently carried out with Qiagen or the Zymo Genomic DNA Clean & Concentrator kit. DNA was quantified, and 260/280 and 260/230 values were obtained with NanoDrop spectrophotometer (Thermo Scientific, Wilmington, DE). The DNA was then fractionated for 16S rRNA and human genetic analysis. Amplicon processing. Forward and reverse reads were merged using Qiime1 using the join_paired_ends. py script. Demultiplexing was performed either with Qiime1 (split_libraries_fastq.py and split_sequence_file_ on_sample_ids.py) or the BBMap pipeline.
The sample inference program DADA2 86 was employed in R for filtering, trimming, and taxonomic assignment of the amplicon sequence variants (ASVs) with single-nucleotide resolution. DADA2 has demonstrated fewer false positive sequence variants than OTU-based methods, and its resolution of biological differences allows for exact sequence inference. It has been suggested that use of ASVs with exact matching (or 100% identity) is an appropriate method for assigning bacterial species to amplicons generated by high-throughput 16S rRNA techniques 86 . However, species assignments here are based on the closest cultured relatives in the Silva database. Reads shorter than 420 bp were discarded, as were reads that matched against the phiX genome. Amplicons were truncated to remove low quality tails and the 21 bp forward primer was removed, resulting in reads with 399 nucleotides in length. Reads with at least one ambiguous nucleotide were filtered out. Reads were truncated at the first instance of a quality score ≤ 11. The maximum error allowed in a read was set to 1 87 . Error rates were learned by DADA2's machine learning algorithm. The resulting reads from the four sequencing pools were merged and chimeras removed. Between 95.47 and 98.43% of sequence variants remained in each pool after chimera removal.
In aggregate, the 372 samples for whom we had microbiome data represented 9352 unique ASVs (6125 without chimeras) with 96.83% of the sequence variants remaining after chimera removal. A total of 13,259,769 reads were present in the dataset, with an average of 35,644.54 (median 33,751) reads per sample. The fewest number of reads in a sample was 6506.
Taxonomy was assigned in DADA2 with the assignTaxonomy and addSpecies functions, using the Silva version 138 database 88 . Species-level assignments were exclusively reserved for exact matching between the ASVs and reference strains, which is considered optimal for identity thresholds using 16S rRNA sequencing data 89 . Full sequences corresponding to the 6135 ASVs are provided as a reference in Supplementary Table 10, along with the taxonomic assignment from SILVA.
Human genotyping and HLA imputations. Extracted DNA was purified using either a Qiagen kit or the Zymo Genomic DNA Clean & Concentrator kit (Genomic DNA Clean & Concentrator-10; Zymo Research, Irvine, CA USA) for high yield ultra-pure and large fragment recovery. DNA concentration was determined using Qubit dsDNA High Sensitivity (Invitrogen, Life technologies Inc., Carlsbad, CA). Purified samples were stored at 4 °C. 100 ng of DNA at a concentration of 5 ng/uL were plated and then sequenced using the Axiom Precision Medicine Research Array (PMRA) (Applied Biosystems, ThermoFisher Scientific) against a standard control for microarray analysis according to the PMRA manufacturer's instructions. The PMRA has a genomewide imputation grid that covers over 902,560 single-nucleotide polymorphism (SNP) markers. The sequencing was performed according to the protocol described in the Axiom 2.0 Assay Manual Workflow User Guide (Thermo Fisher Scientific Inc.) with the GeneTitan MC Instrument. We generated call codes from the SNP data of the 310 subjects whose samples were sequenced by the PMRA and met quality control cut-offs, using the Axiom Array Suite Software (Thermo Fisher Scientific Inc.).
HLA haplotypes were imputed from the SNP genotype data with the Axiom HLA Analysis Suite version 1.2 (Thermo Fisher Scientific Inc.). This imputation determines the HLA type of 11 major MHC Class I and Class II loci with 2-digit and 4-digit resolution over an extended major histocompatibility (MHC) region that affords accurate HLA typing from SNPs 90 , using a multi-population reference panel and HLA imputation model to infer HLA from Affymetrix genotyping arrays 91 . The HLA*IMP:02 model is tolerant of missing data and performs at 2-digit resolution with an accuracy of over 90% across most ethnicities (and with 95% accuracy at 4-digit resolution on diverse European panels). HLA markers were evaluated at 2-digit resolution. Human SNPs associated with increased incidence of SI and subsequent testing of associations with bacterial abundances. There are sixteen single nucleotide polymorphisms (SNPs) assessed by the PMRA and with an EBI trait inclusive of the key word "suicide" or "suicidal. " These SNPs are rs12373805, rs17387100, rs11143230, rs4918918, rs300774, rs358592, rs2462021, rs10437629, rs2610025, rs10748045, rs10448044, rs3781878, rs10854398, rs17173608, rs4732812, and rs7296262. The MAF of each SNP was calculated within the cohort (Supplementary Table 11). Fourteen of the SNPs were observed at a MAF ≥ 0.010, while rs17387100 and rs17173608 were below. The genotypes at these fourteen loci (excluding rs17387100 and rs17173608) were assessed in the cohort with respect to the presence of SI using 3 × 2 and 2 × 2 chi square tests based on dosage and then presence of the minor allele (homozygotes and heterozygotes combined). Kruskal Wallis statistics were then conducted to identify whether and which genera and species were differentially abundant in individuals carrying the risk allele for those SNPs that we found to be associated with SI.

Bacterial statistical analyses. Confounders of the saliva microbiome.
To determine which diet features differed in saliva microbial composition and whether there was microbial divergence among those with or without SI, a PERMANOVA based on Bray-Curtis distance matrix was conducted using the adonis function of the vegan R package on the compositionally transformed counts of the 6,135 ASVs that we identified across the 372 samples. The computation was carried out with 1000 permutations. Reads were compositionally transformed using the microbiome R package.
Microbiome diversity. Global ecosystem state measures including richness 92 , evenness , and diversity were calculated using the alpha function of the microbiome R package and compared across the N sleep cohort, comprised of subjects with sleep issues only (N sleep = 229 with SI sleep = 43 and NOSI sleep = 186), using a non-parametric Wilcoxon test in ggpubr.
Differentially expressed ASVs in individuals with SI. To assess differential expression of the 6,135 ASVs across SI (n = 47) and various depression categories, we first used an RNA-seq based method for assessing a negative binomial distribution by maximum likelihood estimation, DESeq2 93 which has been adapted for microbiome data 94 . For this approach, the log geometric means were calculated using the unnormalized counts of the 6,135 ASVs and applied a pseudo-count of 1. Independent filtering was applied in the DESeq2 package to optimize the adjusted p value by hypothesis weighting 95 . Four comparative groups were compared against SI using this approach: (1) all subjects without SI (NOSI, n = 325), (2) subjects without SI and with a PHQ-9 total score of 0-4 (no or minimal depression, NODEP, n = 157), (3) subjects without SI and with PHQ-9 of 5-9 (mild depression, MILD, n = 128), (4) subjects without SI and with a PHQ-9 score ≥ 10 (high likelihood of depression, DEP, n = 40). For each model, outliers were replaced and 2076 (NOSI), 1369 (DEP), 1813 (NODEP), 1746 (MILD) ASVs were refitted for differential abundance analyses. A local smoothed dispersion fit was applied to estimate the dispersions, and the Wald test used to assess statistical significance. Prevalence differences were calculated using a chi-square test in a SI (n = 47) versus NOSI (n = 325) comparison.
Network analysis of salivary microbiota. Genus-level networks were assessed using a Spearman correlation in MicrobiomeAnalyst 96 . To control for sleep issues, the correlation was carried out with the N sleep restricted cohort (N sleep = 229).
Differentially abundant closest species and genera in individuals with SI after controlling for sex, diet, and sleep. ASVs were combined at higher taxonomic ranks using the tax_glom function of phyloseq. 60.7% of ASVs were identifiable with a closest microbiome relative at the species level in SILVA. Of the 560 species observed in the saliva, 177 were found in at least 2% of the cohort and were retained for differential abundance analysis. Of the 227 genera observed in the saliva, 118 were present in at least 2% of the subjects.
To offset potential over-dispersion that could result from application of RNA-seq paradigms to microbiome datasets 97 , differential abundances were also modeled with nonparametric approaches after matching on potential confounders. Propensity score matching was employed using the matchit function of the MatchIt R package to match SI and NOSI samples on confounders of SI or the microbiome: sex, daily fruit consumption, regular www.nature.com/scientificreports/ consumption of pasta and rice, dining at a fast-food hamburger restaurant more than once weekly, and regular consumption of nuts, since we found that the former and latter are associated with increased risk of SI and the other variables are possible confounders for the saliva microbiome. Propensity score matching was employed at both a 1:2 and 1:3 ratio of SI to NOSI controls. Given that individuals with SI were significantly more likely to report sleep issues on the PHQ-9 (91.5% SI vs. 57.2% NOSI), propensity score matching was again applied to additionally account for sleep differences using a dichotomous classification of response to Q3 of the PHQ-9. In a third model controlling for HLA genetics, propensity score matching was applied on the previously mentioned variables (sex, five dietary habits, and presence of sleep issues) as well as DRB1*15, DQA1*01, DQB1*03, DPB1*05, and A*30 genotype at a 1:2 ratio of SI to NOSI controls. Only individuals with a genotype at all five haplotypes were included in this model, for an N of 93 (SI n = 31, NOSI n = 62). For the propensity score matching analyses, Kruskal-Wallis or Mann-Whitney tests were conducted in R across the 177 species' and 118 genera's percent relative abundances to identify significant differentially abundant features along each of the matching paradigms.
In a restricted dataset comprised only of those individuals who reported sleep issues (N sleep = 229), DESeq2 was run again at the genus and species levels to identify differentially abundant taxa. Of the 227 and 560 genera and species respectively, 75 genera and 219 species passed the internal filtering of the DESeq2 program. These were considered in the final dispersion estimates and model fitting. Those who reported sleep problems and were included in this restricted N sleep dataset.
Core microbial differences in individuals with SI after matching on sleep issues. Microbial communities were defined in the SI sleep and NOSI sleep cohorts selected in the 2:1 propensity score match. In the 47 SI sleep and 94 NOSI sleep individuals, a total of 1960 and 2881 unique ASVs were observed respectively. Prevalence thresholds were determined with out-of-bag error rates for the classification of SI and the ASVs comprising the prevalence threshold resulting in 0% error using the PIME package 98

in R.
Salivary species and genera associated with HLA haplotypes linked to SI. With a two-sided means test in SPSS v27 (IBM), the relative abundances of filtered species and genera were compared across the HLA haplotypes linked to SI risk (DQ2.2 and DQ2.5, DQ8, DR4-DQ8, and DRB1*15).

Data availability
The microbiome data and associated metadata used for this work are available from NCBI BioProject ID PRJNA846199. Human genotyping data that support the findings of this study are available from the corresponding author upon reasonable request.